Cross layer resiliency in real world
نویسندگان
چکیده
Resilience at different design hierarchies will be needed in Complex SoCs to handle failures due to variability, reliability and design errors (logical or electrical). The main reasons for the marginal behavior are sheer design complexity, uncertainties in manufacturing processes, temporal variability and operating conditions. In this session, we will cover the basics of cross layer resiliency and explore the reliability challenges in both embedded processors as well as large scale computing resources. Talk 1: Cross-layer resiliency exploration and optimization (Subhasish Mitra) This talk will discuss systematic methodologies for exploring cross-layer resilience, encompassing error detection, correction and recovery techniques, for complex SoCs consisting of processor cores accelerators, and uncore components. The objective is to answer several key questions such as: (i) Given a design, is cross-layer resilience always the best way to protect that design? While cross-layer resilience is generally considered as the right way forward, systematic quantification of cross-layer resilience benefits is required. (ii) What are the right models that link resilience techniques across multiple layers for quick, yet accurate, estimation of costs in terms of power/energy, performance, and area? (iii) What is the proper framework to explore the large space of existing resilience techniques for error detection, correction, and recovery across various abstraction layers? Talk 2: Reliability challenges in Embedded processors (Vikas Chandra) Embedded processors are now at the heart of the mobile revolution and have the aspirations to power even high performance data centers. It is of utmost importance to understand the reliability challenges in embedded processors and find ways to tackle them across different layers of design abstraction. In this talk, I will talk about the reliability requirements in embedded processors, the challenges we are facing and our approach to make the design more robust. We will discuss our approaches of measuring wearout in commercial processors as well as efficient design of in-situ monitors to track timing errors. Talk 3: Billion chips of trillion transistors: How to make them reliable? (Chen-Yong Cher, Silvia Melitta Mueller) Due to increasing demand for personal devices, high performance computing systems and commercial data centers, microprocessor and main memory designers face numerous challenges in delivering large number of chips at effective cost. While frequency scaling effectively ended, technology scaling continues to provide increasing number of transistors. To effectively utilize these transistors for performance, designers turn to sophisticated and highly integrated chip designs such as multi-core (e.g., Intel i7, IBM POWER7, BlueGene/Q), GPGPU (e.g., NVIDIA Tigra) heterogeneous SoC (e.g., IBM Wirespeed). The increasing demand for chips and transistors presents numerous challenges on reliability, power and manufacturing costs. In large scale HPC systems and data centers, the increasing number of chips also raises per-chip reliability requirement in order to achieve system reliability targets.
منابع مشابه
A New Hybrid Meta-Heuristics Approach to Solve the Parallel Machine Scheduling Problem Considering Human Resiliency Engineering
This paper proposes a mixed integer programming model to solve a non-identical parallel machine (NIPM) scheduling with sequence-dependent set-up times and human resiliency engineering. The presented mathematical model is formulated to consider human factors including Learning, Teamwork and Awareness. Moreover, processing time of jobs are assumed to be non-deterministic and dependent to their st...
متن کاملEvaluation of Urban Resiliency in Physico-Structural Dimension of Karaj Metropolis
Cities as the most complex man-made structures have been always exposed to natural and man-made hazards; these unpredictable risks have imposed seriousimpacts on urban areas. Meanwhile Urban Resiliency is the ability of cities to respond quickly to inappropriate and unpredictable conditions, which continiues stronger than before, consistent to the new conditions. Karaj as one of the metropolise...
متن کاملImproving Network Reliability and QOS in EOIP Through Application Layer Signaling Protocol (RTP)
The “EOIP; Every Thing Over IP“ networks are widely replacing today’s different pillar networks in to the single converged network for Telephony(Any type of Access i.e. Mobile or Fixed Line) and data. Definitely new services will be expected in the NGN / EOIP such as broadcast media transmission and video on demand. But one major issues are always from past so many years are associated with Int...
متن کاملFANTASY: fully automatic network emulation architecture with cross-layer support
Testing and evaluating real-world wireless and mobile systems is very difficult. The volatile nature of the wireless medium and mobility complicates their evaluation. The access to system information hindered by the operating system further increases the evaluation of a real-world system. In contrast, a simulator allows to easily set up complex wireless and mobile scenarios, log protocol variab...
متن کاملA CONTROL-THEORETIC DESIGN AND ANALYSIS FRAMEWORK FOR RESILIENT HARD REAL-TIME SYSTEMS by PRADEEP
A CONTROL-THEORETIC DESIGN AND ANALYSIS FRAMEWORK FOR RESILIENT HARD REAL-TIME SYSTEMS We introduce a new design metric called system-resiliency which characterizes the maximum unpredictable external stresses that any hard-real-time performance mode can withstand. Our proposed systemresiliency framework addresses resiliency determination for real-time systems with physical and hardware limitati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014